3. Modules/Packages/Libraries

This section presents some useful Python libraries:

  1. Matplotlib
  • NumPy
  • Pandas
  • SciKit Learn
  • Plotly

Note: a library is a collection of packages, but I have a habit of not differentiating between the two.

Matplotlib

Pie Chart

In [1]:
import matplotlib.pyplot as plt

# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
sizes = [15, 30, 45, 10]
explode = (0, 0.1, 0, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()
In [2]:
# changing the explode parameters
explode = (0, 0, 0.3, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()
In [3]:
# function to modify explode parameters
def explode_pie(a, b, c, d):
    explode = (a, b, c, d)  # only "explode" the 2nd slice (i.e. 'Hogs')

    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
            shadow=True, startangle=90)
    ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

    plt.show()
    
# explode pie!    
explode_pie(0,0,0.25,0)

More examples of matplotlib vizzes at matplotlib.org.

NumPy

NumPy is library containing array and matrix data structures as well as methods to operate on these structures.

One thing to note is that NumPy is really fast because of how it uses pre-compiled C code in the background and how it vectorizes the data it stores.

Pandas

Basics of Pandas:

  • Powerful data frame structure for PANel DAta
  • Built on NumPy and highly compatible with Matplotlib
  • Useful for storing read-in data or storing data before it is written to a file
  • Great for storing data that is read in from a file or storing data immediately before writing to a file
In [4]:
import pandas as pd

# create python dictionary containing data
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}

# pass data to pandas data frame
purchases = pd.DataFrame(data)
purchases
Out[4]:
apples oranges
0 3 0
1 2 3
2 0 7
3 1 2
In [5]:
# name rows of data frame
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])
purchases
Out[5]:
apples oranges
June 3 0
Robert 2 3
Lily 0 7
David 1 2
In [6]:
# locate June's order
purchases.loc['June']
Out[6]:
apples     3
oranges    0
Name: June, dtype: int64

SciKit Learn

SciKit Learn is one of the primary machine learning packages. Another is TensorFlow.

SciKit Learn Article

An article I found by a credit risk modeler pointed to some of the pros and cons of Python:

  • (+) Python is fast and flexible
  • (+) There is a library for just about anything
  • (-) The Python ecosystem is sometimes fragmented
  • (-) If you are unaware of which library to use, you may have to code your way through it

The author gave examples of the cons based on their experience:

  • Required to recalculate p-values when using Scikit-Learn for logistic regression
  • No built-in stepwise selection feature - had to create a custom loop

Moral of the story: there are tradeoffs to using Python, and we may prefer to continue using more familiar tools where Python is not required.

SciPy

SciPy expands on the linear algebra capabilities of NumPy. If a matrix-related feature is not available in NumPy, or if the available NumPy feature is slow, SciPy might provide a solution.

Plotly

Plotly is another graphing library with lots of interactive features.

Bar Chart

In [7]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=[2, 1, 4, 3]))
fig.add_trace(go.Bar(y=[1, 4, 3, 2]))
fig.update_layout(title = 'Hello Figure')
fig.show()

Bubble Plot

In [8]:
import plotly.express as px
df = px.data.gapminder().query("year==2007")
fig = px.scatter_geo(df, locations="iso_alpha", color="continent",
                     hover_name="country", size="pop",
                     projection="natural earth")
fig.show()
In [ ]: